Goto

Collaborating Authors

 everlasting database


The Everlasting Database: Statistical Validity at a Fair Price

Neural Information Processing Systems

We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for $M$ non-adaptive queries is $O(\log M)$, while the cost to a potentially adaptive user who makes $M$ queries that do not depend on any others is $O(\sqrt{M})$.


The Everlasting Database: Statistical Validity at a Fair Price

Neural Information Processing Systems

We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for M non-adaptive queries is O(\log M), while the cost to a potentially adaptive user who makes M queries that do not depend on any others is O(\sqrt{M}) .


Reviews: The Everlasting Database: Statistical Validity at a Fair Price

Neural Information Processing Systems

This paper studies the problem of answering a (possibly infinite) sequence of (adaptive and non-adaptive) statistical queries without overfitting. Queries trigger the acquisition of fresh data when the mechanism determines that overfitting is likely, so adaptive queries necessitate new data. By continually acquiring fresh data as needed, the mechanism can (whp) guarantee accuracy in perpetuity. Moreover, by passing on the "cost" of data acquisition to queries that trigger it, the mechanism guarantees that (whp) non-adaptive queries pay cost O(log(# queries)) while adaptive queries pay cost O(sqrt(# queries)). Suggested applications are the normal ones for adaptive data analysis: ML competition leaderboards and scientific discovery.


The Everlasting Database: Statistical Validity at a Fair Price

Woodworth, Blake E., Feldman, Vitaly, Rosset, Saharon, Srebro, Nati

Neural Information Processing Systems

We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for $M$ non-adaptive queries is $O(\log M)$, while the cost to a potentially adaptive user who makes $M$ queries that do not depend on any others is $O(\sqrt{M})$. Papers published at the Neural Information Processing Systems Conference.